Boosting Cross-Domain Speech Recognition with Self-Supervision

نویسندگان

چکیده

The cross-domain performance of automatic speech recognition (ASR) could be severely hampered due to the mismatch between training and testing distributions. Since target domain usually lacks labeled data, shifts exist at acoustic linguistic levels, it is challenging perform unsupervised adaptation (UDA) for ASR. Previous work has shown that self-supervised learning (SSL) or pseudo-labeling (PL) effective in UDA by exploiting self-supervisions unlabeled data. However, these also face degradation mismatched distributions, which previous fails address. This presents a systematic framework fully utilize data with self-supervision pre-training fine-tuning paradigm. On one hand, we apply continued replay techniques mitigate SSL pre-trained model. other propose domain-adaptive approach based on PL technique three unique modifications: Firstly, design dual-branch method decrease sensitivity erroneous pseudo-labels; Secondly, devise an uncertainty-aware confidence filtering strategy improve pseudo-label correctness; Thirdly, introduce two-step incorporate knowledge, thus generating more accurate pseudo-labels. Experimental results various scenarios demonstrate proposed effectively boosts significantly outperforms approaches.

برای دانلود باید عضویت طلایی داشته باشید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Boosting Automatic Speech Recognition through Articulatory Inversion

This paper explores whether articulatory features predicted from speech acoustics through inversion may be used to boost the recognition of context-dependent units when combined with acoustic features. For this purpose, we performed articulatory inversion on a corpus containing acoustic and electromagnetic articulography recordings from a single speaker. We then compared the performance of an H...

متن کامل

Boosting localized binary features for speech recognition

In a recent work, the framework of Boosted Binary Features (BBF) was proposed for ASR. In this framework, a small set of localized binary-valued features are selected using the Discrete Adaboost algorithm. These features are then integrated into a standard HMM-based system using either single layer perceptrons (SLP) or multilayer perceptrons (MLP). The features were found to perform significant...

متن کامل

Speech emotion recognition with cross-lingual databases

In this paper, we investigate cross-lingual automatic speech emotion recognition. The basic idea is that since the emotion recognition system is based on the acoustic features only, it is possible to combine data in different languages to improve the recognition accuracy. We begin with the construction of a Mandarin database of emotional speech, which is similar to the well-known Berlin Databas...

متن کامل

Object Localization with Boosting and Weak Supervision for Generic Object Recognition

This paper deals, for the first time, with an analysis of localization capabilities of weakly supervised categorization systems. Most existing categorization approaches have been tested on databases, which (a) either show the object(s) of interest in a very prominent way so that their localization can hardly be judged from these experiments, or (b) at least the learning procedure was done with ...

متن کامل

Cross-Domain Speech Disfluency Detection

We build a model for speech disfluency detection based on conditional random fields (CRFs) using the Switchboard corpus. This model is then applied to a new domain without any adaptation. We show that a technique for detecting speech disfluencies based on Integer Linear Programming (ILP) (Georgila, 2009) significantly outperforms CRFs. In particular, in terms of F-score and NIST Error Rate the ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: IEEE/ACM transactions on audio, speech, and language processing

سال: 2023

ISSN: ['2329-9304', '2329-9290']

DOI: https://doi.org/10.1109/taslp.2023.3301230